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Abstract 

Background: Recurrent events data analysis is common in biomedicine. Literature review indicates tliat most 
statistical models used for such data are often based on time to the first event or consider events within a subject 
as independent. Even when taking into account the non-independence of recurrent events within subjects, data 
analyses are mostly done with continuous risk interval models, which may not be appropriate for treatments 
with sustained effects (e.g., drug treatments of malaria patients). Furthermore, results can be biased in cases of a 
confounding factor implying different risk exposure, e.g. in malaria transmission: if subjects are located at zones 
showing different environmental factors implying different risk exposures. 

Methods: This work aimed to compare four different approaches by analysing recurrent malaria episodes from a 
clinical trial assessing the effectiveness of three malaria treatments [artesunate + amodiaquine (AS + AQ), artesunate + 
sulphadoxine-pyrimethamine (AS + SP) or artemether-lumefantrine (AL)], with continuous and discontinuous risk 
intervals: Andersen-Gill counting process (AG-CP), Prentice-Williams-Peterson counting process (PWP-CP), a shared 
gamma frailty model, and Generalized Estimating Equations model (GEE) using Poisson distribution. Simulations 
were also made to analyse the impact of the addition of a confounding factor on malaria recurrent episodes. 

Results: Using the discontinuous interval analysis, AG-CP and Shared gamma frailty models provided similar 
estimations of treatment effect on malaria recurrent episodes when adjusted on age category. The patients 
had significant decreased risk of recurrent malaria episodes when treated with AS + AQ or AS -i- SP arms compared 
to AL arm; Relative Risks were: 0.75 (95% CI (Confidence Interval): 0.62-0.89), 0.74 (95% CI: 0.62-0.88) respectively for 
AG-CP model and 0.76 (95% CI: 0.64-0.89), 0.74 (95% CI: 0.62-0.87) for the Shared gamma frailty model. 
With both discontinuous and continuous risk intervals analysis, GEE Poisson distribution models failed to detect the 
effect of AS + AQ arm compared to AL arm when adjusted for age category. The discontinuous risk interval analysis 
was found to be the more appropriate approach. 

Conclusion: Repeated event in infectious diseases such as malaria can be analysed with appropriate existing models 
that account for the correlation between multiple events within subjects with common statistical software packages, 
after properly setting up the data structures. 
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Background 

Recurrent events data analysis is quite common in bio- 
medicine, such as low back pain, sick leave from work, 
sporting injuries, hospital readmissions and episodes of 
infectious diseases such as malaria [1-7]. Literature review 
indicates that most statistical models applied to such data 
are often based on naive techniques. Such naive tech- 
niques are characterized by either ignoring the existence 
of recurrent events, or ignoring the fact that the recurrent 
events within subjects are correlated [1,2]. Even when 
taking into account the non-independence of recurrent 
events within subjects, data analyses are mostly done with 
continuous risk interval models [2-5], which may not be 
relevant for health conditions with discontinuous risk [8]. 
In the medical field, it is quite common to encounter 
recurrent health conditions with such discontinuous risk 
intervals, e.g. in cases with persistent treatment effect. 
Examples include infections, such as malaria, disability 
episodes, hospitalizations, and nursing home admissions 
[7-12]. When subjects have a disability episode, they are 
not at risk of the second episode of disability until they 
have recovered from the first episode. To obtain unbiased 
estimates of incidence rates, the person-time period when 
the subject is not at risk should be excluded from the risk 
set. When analysing recurrent time-to-event outcomes 
with discontinuous risk intervals the subject is not at risk 
of another event while a previous one is ongoing or if the 
subject is under treatment. Appropriate models for analys- 
ing recurrent events data include marginal models or 
frailty or random coefficient analysis models [8-13], which 
take into account the non independence assumption of 
events within the subject. 

Furthermore, in the case of malaria treatment trials, 
investigators assume that randomization is sufficient for 
controlling differential risk. However, the location of 
each subject is important as the risk exposure shows 
high spatial variations due to different environmental 
factors [14,15] that must be taken into account. 

This work aimed at comparing different approaches 
analysing recurrent malaria episodes, with continuous 
and discontinuous risk interval models, in order to con- 
tribute identifying useful models to analyse such malaria 
data. 

Methods 

Study design 

Data were collected from July 2005 to July 2007 in 
Bougoula-Hameau, Sikasso, in the south region of Mali. 
Patients were randomized as they came to the health 
centre from July 2005 to October 2006 (accrual period) to 
one of the artemisinin-based combination therapy (ACT) 
arm: artesunate + amodiaquine (AS + AQ) or artesunate + 
sulphadoxine-pyrimethamine (AS + SP) or artemether- 
lumefantrine (AL). The study's main objective was to 



compare malaria incidence between these three ACT. Pa- 
tients received the same initial treatment at each subse- 
quent episode of uncomplicated malaria during the course 
of the study. The clinical data and results related to the 
main objective were published elsewhere [7]. 

Statistical models and data analysis 

Four models were used for the analysis of recurrent time- 
to-event outcomes: i) Generalized estimating equations 
(GEE) model using a Poisson distribution; and three ex- 
tended Cox models: ii) the Andersen-Gill counting process 
(AG -CP), iii) the Prentice- Williams -Peterson counting 
process (PWP-CP); and iv) the Shared gamma fraOty model. 
To take into account the recurrent structure of the data, 
models i) to iii) are marginal and model iv) uses a shared 
frailty term. 

Both continuous and discontinuous time interval ap- 
proaches were used with the four different models to 
analyse the data. A 14 day washout period was estimated 
after each episode treatment based on the pharmacokin- 
etics of the study drugs. Model results were evaluated by 
comparing the risk ratio (RR) estimates and their stand- 
ard errors (SE). 

To analyse the impact of a confounding factor on re- 
sults, a simulation of a risk exposure was done with two 
levels (high risk exposure versus low risk exposure) using a 
Binomial distribution with parameter (probability of being 
in the high risk exposure class) depending on the number 
of malaria episodes: 0.99 if the subject experienced >6 mal- 
aria episodes else 0.75 if the subject experienced > 4 mal- 
aria episodes, else 0.25 if subject experienced >1 episode 
and otherwise 0.01. This binary factor simulates, for 
example, two zones of different exposure. For analys- 
ing the impact of such a confounding factor, 1,000 in- 
dependent replicates were performed, using R2.15.2 
software (The R Foundation for Statistical Computing, 
Vienna, Austria). With the simulation data, only dis- 
continuous time interval analysis was done with each 
of the four models. The impact of the simulated con- 
founding factor on the results for each model was 
assessed by comparing the magnitude and confidence 
intervals of RR estimates, power and empirical cover- 
age rates (ECR). For each covariate, the ratio between 
the standard error (SE) of the estimates using simulated 
data and the SE of the estimates using the observed data 
was computed. These criteria can be interpreted as the 
impact of the confounding factor on the estimation 
accuracy. 

GEE model using Poisson distribution 

The Poisson regression model is frequently used to ana- 
lyse count data or to study disease incidence and mortal- 
ity when the dependent variable represents the number 
of independent events that occur during a fixed period 
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of time [16]. The conditional mean of Y (number of 
events) can be written as: 



Ln{Y\X,/3)=Xi/^ 



(1) 



Where, = ySo + /S A + /S2^2 + • • • + jS„X„; Ln is the 
natural logarithm (the canonical link between the linear 
predictors and the conditional mean of Y). 

The GEE Poisson estimates the same model as the 
standard Poisson regression allowing for dependence 
within clusters. Therefore, it is appropriate to model re- 
current events within a subject, such as in longitudinal data. 
The regression coefficients are refit, correcting iteratively 
for the correlation. In such models, the within-subject cor- 
relation structure is treated as a nuisance parameter. In this 
work the exchangeable correlation structure has been used 
assuming that the correlation between events remained 
constant through the time [17]. 

Extended Cox models 

The extended Cox models are used to model recurrent 
events within a subject unlike the Cox model, which is 
used to model a unique event or, sometimes, the first 
event. The considered Cox extended models were: the 
counting process model (Anderson-GUI model or AG-CP) 
[18], and the conditional model (Prentice-WUliams-Peter- 
son counting process model or PWP-CP) [16]. 

The sandwich robust standard error of Lin and Wei 
[3,19], which is a variance-correction technique, is usu- 
ally employed together with these Cox extended models 
to avoid inflation of type I error due to multiple observa- 
tions per individual which do not require specification of 
the correlation matrix. 



The Anderson-Gill model (AG-CP) 

The formula is written as: 



(2) 



Xi//t) represents the hazard function for the event of 
the subject at time t; \o(t) represents the common 
baseline hazard for all events over time; Xi/^ represents 
the vector of p covariates processes for the f'' individual; 
j6 is a fixed vector of p coefficients; 1,^^ is a predictable 
process, taking values in {1,0} indicating when the f'' in- 
dividual is under observation. 

The AG-CP model uses this counting process time- 
scale for all episodes. The time-scale does not reset to 0 
after an episode (Table 1). Data for each subject needs to 
be entered in the counting process style, with a start time, 
stop time and censoring indicator for each event. 



Conditional model (Prentice-Williams-Peterson counting 
process-PWP-CP): 

The PWP-CP model is similar to the AG-CP model but 
stratified by events. The formula is written as: 



(3) 



Xok( f) represents the event-specific baseline hazard for the 
/c"' event over time. In this model, a subject is assumed 
not to be at risk for a subsequent event until a current 
event has terminated. 

The shared frailty model 

The frailty model, introduced in the biostatistical litera- 
ture by Vaupel et al. [20], and discussed in detail by 
Hougaard, Duchateau and Janssen, and Wienke et al. 
[21-23], accounts for the heterogeneity in baseline. This 
model is an extension of the proportional hazards model 
in which the hazard function depends upon an unob- 
servable random variable. Subjects may be exposed to 
different risk levels, even after controlling for known risk 
factors, because of some relevant unobserved covariates. 
The frailty parameter models these unknown covariates. 
In a shared fi'ailty model, individuals in the same group 
share the same frailty value which generates dependence 
between those individuals who share frailties. 
The shared frailty model can be written as follows: 



Xik{t/X,^, u) = UiXikit) = Ao(i)e^*^+"^ 



(4) 



Where Xjn^ is the conditional hazard function for the 
k'^^ subject from the f'' cluster (conditional on m,); Xo{t) 
is the baseline hazard; /S is the fixed effects vector of di- 
mension p; Xii^ is the vector of covariates; m, is the random 
effect for the f'^ cluster. Subjects in the same cluster u 
share the same frailty factor [22]. It is a conditional hazard 
model, given the m,. The cluster may represent a family, 
for example, or as in this case a single subject for which 
multiple episodes are observed. 

The distribution of u may be Gamma, Gaussian, or 
other distribution. The gamma distribution has been 
chosen because of its mathematical tractability and be- 
cause it is widely used [22]. The one-parameter chosen 
gamma distribution is defined as: 



/.(«) 



vi/e-ie(-M/0) 

e''r(i/6i) 



(5) 



with r the gamma function. Note that E(u) = 1 and 
Var(u) = 6. This gives the following interpretation: subject 
in a class i with m, > 1 are frail, meaning of higher risk 
while subject with m, <1 are strong, meaning of lower risk. 
The parameter 9 informs on the clusters or classes hetero- 
geneity in the population. 
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Table 1 Data structures for modelling recurrent time-to-event outcomes 


ID Start 


End 


Episode 


Order 


Time 


Treatment 


Age (Years) 


Quarter 


1 0 


28 


1 


1 


28 


AS + SP 


3.93 


1 


1 42 


52 


1 


2 


10 


AS-SP 


3.93 




1 476 


700 


0 


10 


224 


AS + SP 


3.93 




2 0 


77 


1 


1 


77 


AS + AQ 


1.15 




2 91 


375 


1 


2 


284 


AS + AQ 


1.15 




2 417 


700 


0 


4 


283 


AS + AQ 


1.15 




3 0 


28 


1 


1 


28 


AL 


1.48 




3 42 


78 


1 


2 


36 


AL 


1.48 




3 150 


700 


0 


5 


550 


AL 


1.48 





Data dictionary: 

ID: study subject identification number; start: tlie start time of the interval (in days}; end: the time (in days) at which the event occurs or the time of censoring; 
episode, the occurrence of malaria episode (yes = 1, no = 0); order: the order of the episodes, which is used only for the PWP-CP model; time: the number of days 
at risk that is calculated from subtracting end from start variables; treatment: the same malaria treatment given to the patient during each episode; Age (Years): 
the patients age at enrolment in years; quarter: the resident place or bloc of the patient in the village (old quarter = 1, new quarter = 0). 



As in the Cox model or its extensions, the baseline 
hazard function for the frailty model does not vary by 
event, but the coefficient estimates of covariates effect 
from the frailty model, unlike the Cox model may vary if 
there is a significant random effect. 

For the baseline hazard function, although other distri- 
bution could be used, the Weibull proportional hazards 
distribution was assumed. Weibull distributed event 
times are often used in practice, because they are able to 
describe the actual evolution of the hazard function in 
an appropriate way in many circumstances. Furthermore 
it is a popular flexible parametric model that allows the 
inclusion of covariates of the survival times [22] . 

Data structure 

The duration of each subject in the study was defined as 
the time between enrolment and the end of the study or 
until the subject is lost to follow or withdrawn. A mal- 
aria episode (event) had to be preceded and followed by 
a time period without malaria except in the case of with- 
drawal and at the end of the study period. As an ex- 
ample. Table 1 provides data for three study subjects 
(one in each study arm). Subject 1 had nine malaria epi- 
sodes at days 28, 52, 80, 109, 305, 326, 410, 438 and 462 
and the follow-up ended on day 700. Subject 2 had three 
malaria episodes at days 77, 375 and 403 and the follow- 
up ended on day 700. Subject 3 had four malaria epi- 
sodes at days 28, 78, 105 and 136 and the follow-up 
ended on day 700. The duration of each malaria episode 
is 14 days as it is assumed that the subject is not at risk 
of new infection for this duration after treatment initi- 
ation (discontinuous risk interval data structure). 

The data are organized as one record per subject per 
event. The data structure for the AG-CP model consists 
of the first four columns. A subject with multiple events 
is considered as multiple subjects for analytic purposes. 



For example, subject 2 is considered four times: the first 
begins follow-up at time 0 and has an event at 77 days; 
according to the fact that the subject is not at risk during 
14 days, the second has delayed entry at 91 days and has 
an event at 375 days; the third has delayed entry at 
389 days and has an event at 403 and is followed through 
700 days without having an event. Because the counting 
process model does not consider the order of the events, 
it does not use the "order" column. In the PWP-CP model, 
a subject is assumed not to be at risk for a subsequent 
event until the current event has terminated. This means, 
one cannot be at risk for the second event without having 
experienced and completed the first event. The data struc- 
ture for PWP-CP model is similar to that of the counting 
process AG-CP model except that the "order" column is 
also used to identify the event order. An Additional file 1 
shows statistical codes for each model using Stata and R 
Software [see Additional file 1]. 

Incidence rate and relative risk estimate of recurrent 
events 

The incidence rate was computed as the number of 
events per person-days [24] and can be calculated from 
Table 1 as follows: 

n 

event ij 

k=l 

n 

timeij 

where eventij is the event status (1 or 0) for the subject 
in the interval; timeij is the time at risk for the sub- 
ject in the interval; n is the number of subjects. The 
relative risks (RRs) were computed to assess the treatment 
effect (AL study arm was used as the reference treatment) 
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and the age group effect (age group < 5 years old, 5-9 years 
old and >9 years old. The age group >9 years old was used 
as reference group) on the occurrence of malaria episode. 
The hazard ratio (HR) for extended Cox models and the 
relative risk (RR) for the GEE model were estimated. 

Ethical considerations 

The study has been approved from the institutional ethical 
committee (FWA #00001769) at the Faculty of Medicine, 
Pharmacy and Odonto-Stomatogy (FMPOS)/USTTB, 
Bamako, Mali. 

A written consent was also obtained from each partici- 
pant or their parent/legal guardian. 

Results 

From July 2005 to July 2007, the 777 subjects enrolled 
into the study yielded a total of 1,649 malaria episodes 
(min = 1, max = 12, median = 2 episodes per subject). 
Using the discontinuous risk interval analysis, PWP-CP, 
AG-CP, and the Shared gamma frailty models provided 
larger treatment effect on malaria episodes compared to 
GEE for the patients treated with AS + AQ or AS + SP 
as compared to the AL arm; RRs were: 0.75 (95% CI 
(Confidence Interval): 0.62-0.89), 0.74 (95% CI: 0.62- 
0.88) respectively for AG-CP model, 0.76 (95% CI: 0.64- 
0.89), 0.74 (95% CI: 0.62-0.87) for the Shared gamma 
frailty model and 1.02 (0.93-1.11), 0.93 (0.87-0.99) for 
GEE model (Table 2). Similarly for the age category, using 
the discontinuous (Table 2) risk interval analysis, PWP- 
CP, AG-CP and the Shared gamma frailty models provided 
similar and higher magnitude of RRs for the patients in 
age group <5 years old or age group between 5-9 years old 
compared to patients of age group >9 years old; RRs were: 
3.16 (95% CI: 2.15-4.65), 2.61 (95% CI: 1.76-3.88) respect- 
ively for AG-CP model and 3.04 (95% CI: 2.27-4.09), 2.54 
(95% CI: 1.87-3.45) respectively for Shared gamma frailty 
model. The effect of covariates (treatments and age 
category) on malaria episodes were slightly higher for 
both AG-CP and the Shared gamma frailty models in 



discontinuous risk intervals (Table 2) compared to 
continuous risk intervals (Table 3). 

Using both discontinuous (Table 2) and continuous 
(Table 3) risk intervals analysis, GEE Poisson models did 
not find a preventive efficacy for the patients treated 
with AS + AQ arm compared to AL arm. Furthermore, 
the GEE models estimated a protective efficacy (1-RR) of 
lower magnitude for AS + SP than the 3 other models. 
The discontinuous (Table 2) and continuous (Table 3) 
risk intervals analysis results were similar for GEE 
Poisson model. 

Incidence rates (Table 4) were slightly higher in the 
discontinuous interval analysis compared to the continu- 
ous interval analysis as the person-time was lower, al- 
though the 5% significance level was not reached for the 
incidence rate differences between treatment groups 
(exact mid p-values). 

Assessing the impact of risk exposure covariate with 
the simulated data (Figure 1), treatment effect estimates 
(AS + AQ and AS + SP compared to AL) were relatively 
lower than those with observed data for the AG-CP and 
the Shared gamma frailty models, but still remain signifi- 
cant for each extended Cox models. For the GEE model, 
there were no significant treatment at 5% for both AS + AQ 
and AS + SP compared to AL. Simulated data (Figure 1) 
confirmed the significant (at 5% significant level) treat- 
ment effect of AS + AQ and AS + SP compared to AL with 
power > 80% for all models except the GEE Poisson dis- 
tribution model. Also, the age category effects (age cat- 
egory < 5 years and age category 5-9 years old compared 
to the age category > 9 years old) on malaria episodes were 
relatively lower compared to the effects of the age category 
on malaria episodes using the observed data analysis. For 
the GEE model, there were moderate to no significant age 
category effects. 

The simulated data (Figure 1) showed that adding a 
significant covariate reduces estimation variances. For 
the AS + AQ treatment, the standard error ratio (simu- 
lated over observed) for the AG-CP model was 0.72 
(2.5-97.5 percentile [0.60-0.85]), while it was 0.16 (2.5- 



Table 2 Coefficient estimates according to model by discontinuous risk intervals analysis 



Models 


AS-l-AQ* RR/HR (SE); 
[95% CI]; p 


AS-l-SP* RR/HR (SE); 
[95% CI]; p 


<5 years** RR/HR (SE); 
[95% CI]; p 


5-9 years** RR/HR (SE); 
[95% CI]; p 


GEE, Poisson distribution 


1.02 (0.044); [0.93-1.11]; 


0.93 (0.029); [0.87- 0.99]; 


1.36 (0.201); [1.02-1.82]; 


1.22 (0.181); [0.91-1.63]; 




p = 0.722 


p = 0.018 


p = 0.036 


p = 0.175 


AG-CP 


0.75 (0.068); [0.62-0.89]; 


0.74 (0.065); [0.62-0.88]; 


3.16 (0.621); [2.15-4.65]; 


2.61 (0.526); [1.76-3.88]; 




p < 0.001 


p < 0.001 


p < 0.001 


p < 0.001 


PWP-CP 


0.86 (0.055); [0.76-0.97]; 


0.85 (0.052); [0.75-0.96]; 


2.34 (0.389); [1.69-3.24]; 


2.04 (0.345); [1 .46-2.84]; 




p = 0.015 


p = 0.007 


p < 0.001 


p < 0.001 


Sliared gamma frailty 


0.76 (0.064); [0.64-0.89]; 


0.74 (0.063); [0.62-0.87]; 


3.04 (0.458); [2.27-4.09]; 


2.54 (0.397); [1.87-3.45]; 




p = 0.001 


p < 0.001 


p < 0.001 


p < 0.001 



*Reference treatment: AL. 



**Reference age group: >9 years old. 

CI: Confidence interval for RR/HR (Relative risk/Hazard ratio); SE: Standard error; p: p value. 
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Table 3 Coefficient estimates according to model by continuous risk intervals analysis 



Models 


AS-l-AQ* RR/HR (SE); 
[95% CI]; p 


AS-l-SP* RR/HR (SE); 
[95% CI]; p 


<5 years** RR/HR (SE); 
[95% CI]; p 


5-9 years** RR/HR (SE); 
[95% CI]; p 


GEE, Poisson distribution 


1.02(0.044); [0.93-1.11]; 


0.93 (0.029); [0.87- 0.99]; 


1.36 (0.201); [1.02-1.82]; 


1.22 (0.181); [0.91-1.63]; 




p = 0.722 


p = 0.02 


p = 0.04 


p = 0.18 


AG-CP 


0.77 (0.064); [0.65-0.91]; 


0.76 (0.062); [0.65-0.89]; 


2.94 (0.559); [2.03-4.27]; 


2.48 (0.481); [1.69-3.62]; 




p = 0.002 


p = 0.001 


p < 0.001 


p < 0.001 


PWP-CP 


0.83 (0.053); [0.73-0.94]; 


0.81 (0.050); [0.72-0.92]; 


2.35 (0.388); [1.71-3.26]; 


2.05 (0.344); [1.48-2.85]; 




p = 0.004 


p = 0.001 


p < 0.001 


p < 0.001 


Sliared gamma frailty 


0.77 (0.061); [0.66-0.90]; 


0.76 (0.060); [0.65-0.89]; 


2.88 (0415); [2.17-3.82]; 


2.43 (0.364); [1.82-3.26]; 




p = 0.001 


p < 0.001 


p < 0.001 


p < 0.001 



*Reference treatment: AL. 



**Reference age group: >9 years old. 

CI: Confidence interval for RR/HR (Relative risk/Hazard ratio); SE: Standard error; p: p value. 



97.5 percentile [0.13-0.24]) for GEE Poisson distribution 
model. For the AS -i- SP treatment, the standard error ra- 
tio (simulated over observed) for the AG -CP model was 
0.75 (2.5-97.5 percentile [0.63-0.89]), while it was 0.25 
(2.5-97.5 percentile [0.23-0.54]) for GEE Poisson distribu- 
tion model. The AS + SP effect estimates was clearly modi- 
fied when using the GEE model (ECR = 7.7%), similarly to 
small age categories effect estimates when using the Shared 
gamma frailty model (ECR = 16.1%). AG-CP and PWP-CP 
models estimates were particularly stable after the addition 
of the simulated confounding covariate. 

Discussion 

Methods are available to analyse data with recurrent 
events while accounting for the lack of independence 
among events [3-6,8-13]. In this paper, when focusing 
on recurrent malaria episodes data, results were different 
according to the model used, highlighting the importance 
of the model choice according to the medical question 
studied and the collected dataset. Two other survival 
models for recurrent events [3,8,20] have been proposed: 
Wei-Lin-Weissfeld (WLW), Prentice-Williams-Peterson 
gap time (PWP-GT). Various earlier studies have dis- 
cussed the difficulty in conceptualizing the risk set of the 
WLW model and the biases in the estimates from the 
WLW and PWP-GT models [3,6,25], so these two models 
were not considered here. There is a limited applied 
statistical research modelling malaria recurrent episodes 
data [12], though malaria is one of the most devastating 



diseases in sub-Saharan Africa. The literature review indi- 
cates that, this is the first applied statistical research mod- 
elling using the main extended Cox models on recurrent 
malaria episodes data with both discontinuous and con- 
tinuous risk intervals. The data structures have been effi- 
ciently prior organized with discontinuous risk intervals in 
order to perform the analysis using these different models. 
For these analyses, the time while the subject is not at risk 
because of the malaria treatment has to be taken into ac- 
count and excluded from the risk set. There is a need of 
careful preparation of the data structure before applying 
theses analyses. 

The AG-CP and the Shared gamma frailty models, 
with both discontinuous and continuous risk intervals 
provided similar parameter estimates of treatment ef- 
fects (with respect to the referent treatment) or for age 
category effects (in respect to age referent category) on 
malaria recurrent episodes. This was previously observed 
by others [8] where the authors reported significant co- 
variate effect estimates using AG-CP and the Shared 
gamma frailty models with discontinuous risk intervals 
analysis. AG-CP models are known to be useful and 
robust if one is interested in the overall effect, such as 
the treatment effect in a clinical trial, and if there is no 
clear biological mechanism underlying the relation be- 
tween the first event and subsequent events [8,11]. The 
actual used data seems adapted to this described situation 
where having the first malaria episode does not preclude 
subsequent malaria episode as long as exposure is present 



Table 4 Incidence rate* per treatment arm according to discontinuous or continuous risk intervals analysis 

Model AL AS-l-AQ AS-l-SP 

IR [95% CI] IR [95% CI] IR [95% CI] 

Discontinuous time risk intervals 2.01 [1.86-2.17] 1.52 [1.39-1.66] 1 .50 [1 .37-1 .64] 

Continuous time risl< intervals 1 .87 [1 .73-2.02] 1.44 [1.32-1.57] 1.42 [1.30-1.55] 

p-value 0.09 0.19 0.19 

*Malaria episodes/person/year. 

IR: Incidence rate; CI: Confidence interval for IR; p-values: Exact mid p-values for risl< difference between discontinuous and continuous risk intervals. 
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Figure 1 Coefficients estimate using observed and simulated data using discontinuous risk intervals analysis. Dark dots with lines are 
Relative risk/Hazard ratio and their 95% Confidence intervals respectively; Vertical central dark line is either the no effect treatment line compared 
to the referent category (artemether-lumefantrine) or the no effect age group compared the referent category (age group > 9 years old). 
Abbreviations: AS + AQ, artesunate + amodiaquine; AS + SP, artesunate + sulphadoxine-pyrimethamine; AG, Andersen-Gill; PWP, Prentice-Williams- 
Peterson; GEE, generalized estimating equation; SE, standard error; ECR, empirical coverage rate. 



(transmission condition), and is taken into account. The 
counting process (or AG-CP) model requires few assump- 
tions, and is comparably robust like the traditional Cox re- 
gression model [3,18]. Although the Shared gamma frailty 
models generated similar results to the AG-CP model in 
this study, the specification of the frailty distribution may 
affect the coefficient estimates and more research is still 
needed in this area [8,22]. 

AS -I- SP, then followed by AS -i- AQ showed significant 
protective effects (1-RR) against recurrent malaria episodes 
compared to AL using the three extended Cox models 
(AG-CR PWP-CP and the Shared gamma frailty). This 
could be explained by the longer half-life of the partner 
drugs (AQ and SP) and this observation has been reported 



previously [7]. In contrast, both the age category of <5 years 
old and the age category of 5-9 years old were significantly 
at higher risk of recurrent malaria episodes compared to 
old children and adults (>9 years old) using the three 
extended Cox models (AG-CP PWP-CP and the Shared 
gamma frailty). 

Time discontinuous or continuous risk intervals models 
should be chosen based on the disease or outcome condi- 
tions. In this case, the discontinuous risk intervals model 
should be chosen for unbiased estimates of covariate coef- 
ficients as shown in these data and also reported previously 
[8]. It is also more appropriate from an epidemiological 
point of view to take into account the time a subject is not 
at risk for a disease in a given period of time. 
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Less or no significant covariate effects (with respect to 
referent category) were found on malaria recurrent epi- 
sodes using GEE for both discontinuous and continuous 
risk intervals models which yielded similar coefficient es- 
timates. GEE is considered a very flexible approach [26]. 
Indeed, GEE models can handle a variety of correlated 
measure models that arise from recurrent episodes 
measure in the same individuals over time [27-29]. GEE 
models are robust to misspecification of the correlations 
structures [26]. Selecting robust standard errors (Huber/ 
White Sandwich Estimators; as opposed to conventional 
standard errors) allow the estimates to be valid even if 
the correlation structure is misspecified. The poor per- 
formance of GEE, shown here, may also be explained by 
inappropriate assumption of the Poisson distribution 
[30] as the data maybe underdispersed with 70% of the 
dependant value count greater than zero. 

Analyses including the simulated variables also showed 
significant treatments (AS + AQ and AS + SP using AL 
as referent category) and age category (using >9 years 
old as referent category) effects on malaria recurrent ep- 
isodes with AG-CP PWP-CP and the Shared gamma 
frailty models as in observed data analysis though these 
effects were more important for AG-CP and the Shared 
gamma frailty models. 

The simulation study showed that a risk exposure factor 
modified the estimates, reducing the treatment effects. In 
malaria studies conducted in sub-Saharan countries, the 
main risk exposure factor includes the house location of 
subjects [14,15] due to specific environments, which influ- 
ence malaria transmission. The distribution of mosquito 
breeding sites within villages is not uniform, thus, the ex- 
posure varies within villages, and this must be taken into 
account when assessing treatment effects. As shown in 
Figure 1, the simulation data provided proof of an existing 
confounding factor that needs to be taken into account. 
In fact, the RRs and their confidence intervals esti- 
mated on simulated data trended toward the null hy- 
pothesis (more close for RRs confidence intervals to 
include 1) compared to the RRs and their confidence in- 
tervals estimated on observed data. These variations were 
more important for the AG-CP and the Shared gamma 
frailty models despite their power (ability to detect signifi- 
cant effect) than in the PWP-CP and GEE Poisson distri- 
bution models. But, according to the empirical coverage 
rate as described by Burton et al. [31], the PWP-CP 
remained the more robust to the addition of the simulated 
confounding factor. 

Conclusions 

In the context of malaria, statistical models have to take 
recurrent events and discontinuous time into account to 
better estimate effects of covariates such as treatments or 
risk factors such as age on malaria recurrent episodes. In 



this context, the three extended Cox models presented 
here are of high interest and showed similar results. 

Additional file 



Additional file 1: Stata (Stata Corporation 201 1) and R codes for 
tlie different models. 
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